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ABSTRACT 

To achieve substantive as well $s procedural 
compliance with Public Law 94-142, it must be determined whether 
using^ the formative evaluation system which is useful for monitoring 
the effects of instruction, increases teacher success in, developing 
student programs . Causal modeling techniques were \ised to examine the 
relationships among implementation of a formative evaluation system, 
structure oi- instructional programs, and achievement for 117 students 
in grades 1-7. The Accuracy of implementation Rating Scale monitored 
implementation procedures and the Structure of Instruction Rating 
Scale measured the degree of instructional lesson structure students 
received. Reading achievement measures were collected three times 
over the 5-month period by 31 trained teachers. Measurement, 
structure, and achievement were stable across time and measurement 
had a short-lived ,effect on achievement. Measuring student 
performance had an fearlV effept on achievement; as did silent ^reading 
practice. Determining the effect of implementation of an evaluation 
system, or structure of lessons and student achievement was not • 
realized via the present analysis. The appendices contain the 
Accuracy of Implementati6n Rating Scale and the Structure of 
Instruction Rating Scale. (Author/PN) 
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Abstract 

Causal modeling techniques were used to examine the relationships 
among implementation of a formative evaluation system, structure of 
instructional programs, and achievement for 117 students in grades 
1-7. Measures were collected three times over the five-month period. 
All three constructs were stable atross time. Measuring student 
performance had an early effect on achievement, as did silent reading 
practice. /Limitations of the study and the need for further analyses 
are discussed. 
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Teaching Structure and Student Achievement Effects of- 
Curr iculum-Based Measurement: A Causal (Structural) Analysis 

In recent years greater demands have been placed on educators, 
especially special educators, to be accountable for the quality of 
educational decision^ and the ways in which decisions a*e made, A 
number of criteria to be f oil owed in assessment and decisron-making* 
procedures have been outlined in PL 94-142. Implementation of this 
part of the law has proved^ to be difficult due to the absence .of- 
technical knowledge that would enable schools, to comply with .the 
intent of the law as well as the procedures outlined in the law. In 
response to this problem, the Institute for Research on Learning 
^Disabilities at the University of Minnesota (IRLD), for the past six 
years under federal contract, has conducted a program of research and 
development that Jias" had as its goal developing a functional 1 system 
Tor developing and fnoni tori ng progress on lEP goals, as intended,--* 5 
94-142. 

One -objective of this research and development program has been 
to determine empirically the effects of teachers using the formative 
evaluation system developed by the IRLD on student achievement^ in 
reading, spelling, and written expression. If we are to, achieve* 
substantive as well as procedural compliance with the law (Deno & 
Mirkin, 1982) we must determine whether uSing the formative evaluation 
system increases teacher success in developing student programs. In 

0 

answering this " question our focui has been on the IEP adjustment 

decision that teachers make once special education is being provided 

for a student. The formative evaluation system .is an assessment 
* * • 

device for monitoring the effectiveness of the I£P, (See Figure 1.), 



The hypothesis is that if ah adequate system of formative evaluation 
is developed, teachers may use' this system to monitor student progress 
and the effectiveness of their instruction. If student progress is 
not adequate, then teachers judge their instruction to be ineffective 
and modify their instruction .in an attempt .to improve the student's 
progress. , 



Insert Figure 1 about here 

The rationale underlying this hypothesis rests on a set of 
assumptions,' First, the success of special education ,is defined by 
the extent to which students 1 academic and. social behaviors are 
improved. Second, for any mildly or moderately handicapped student, 
'it is impossible to reliably identify special educational alternatives 
that will be more effective than the regular classroom program. Given 
the first two assumptions, the- initial IEP then must be viewed as a 
guess about what rtiight be helpful to the student' rather than a plan 
tha'fns guaranteed to help. If the IEP is only a guess, then there is 
no alternative- but to continuously evaluate the effectiveness of the 
IEP and to modify it when ft is unsuccessful. Under such conditions, 
teachers should 'be .able. to increase the success of special education 
by systematically measuring student progress toward the achievement of 
program goals and then adjusting student programs to / enhance that, 
progress. In a responsive system such as this, studejt performance 
data function as the most useful "vital signs" of whether a, program is 
working or - should be changed. A evaluation system,' When effective, 



allows teachers to empirically test their best hunches ' about how to 
help students. * . , 

One desirable characteristic of a formative evaluation system is 
that it be useful for monitoring, the effects of any type of 
instruction. For example, whether the teacher chooses DISTAR, a basic 
sight word method, or any other ^approach to teach reading, the 
monitoring system should accurately measure the students progress in 
reading, and it must be unbiased with respect to various theoretical 
approaches to teaching. 
Stage One 

In order to accomplish the goal of the research and development 

program, a three-stage pISurrwas designed (Deno, 1979)., Stage One 

included:, (a) the -wfghtif ication of the behaviors to be, measured in 

reading, spelling, and written expression, (b) the development of 

technically adequate measurement procedures for measuring those 

behaviors, and (c) an Exploration of alternative apprpaches (rule 

sysems) for using the .data ^generated by these measures to make- 

decisions about the effectiveness of instruction. The studies in 
% * * * 

Stage One were intended to lay a foundation fdr subsequent engineering 
of a generic formative evaluation system. Weaiifyjng valid simple ' 
measures of student performance was critical since later development 
of the evaluation system rested on whether performance data that were 

4 

technically adequate could be easily and 'freqi/ently co^ected. 

* Consistent with the intent of the three-stage plarti, measurement 
and evaluation procedures were developed for three academic areas 
(reading, spell ingV and written' expression) . The focus of the present 



investigation, however, was the use Of the procedures when the IEP 
|joaT was reading. Therefore, 'the remainder, of 'this introduction' is 

restricted to reading. . • 

* ' • «. • 

The basic strategy used to identify useful measures .involved a 

process of elimination. Initially, a pool of five easily measured 

reading behaviors was generated .through a review of the available 

literature.' The behaviors measured in reacting included: (a) reading 

^so.lated word lists; (b) reading /isolat'ed words in context- (c) 
# 

reading aloud from text; (d) identifying deleted words in text; and 
(e) giving word meanings (Den6, Mirkin,' & Chiang; 1982)." The next 
step % was to develop simple standardized * measurement procedures. 

Specific directions were devised that could be used routinely to 
conduct assessment. These speci'fids included how to choose ;a ■ sample 
and' provide directions to the student. The third step was to 

-determine the criterion 'validity of the measurement procedures by 
correlating the scores obtained from them with scores on commercially 
available standardized measures, .with program placement, and with 
grade ^level. The measures that were not reliable or valijd, or those, 
that were deemed less , acceptable with respect toany other desired' 
characteristics, were eliminated from the pool. 

The .results of the criterion validity research led to 'the 
conclusion that reading aloud from, a basal text is an optimal behavior 

"to measure in reading. . The'rationale for this selection includes the 
fact that reading aloud provides a broader range of scores than 
isolated words and relates somewhat more closely to comprehension. In 
addition, reading aloud requires little teacher preparation since a- 



ERLC 



"teacher- -can • simply randomly select a passage and direct a child to* 
read aloud. The procedures for measuring reading aloud have been 
detailed elsewhere (Mirkin, Deno, Fuchs, Wesson,' Tindal, Marston, &■ 
ICuehnle, 1981) and are described in the procedures • section of this 
paper, > 

Orfte the procedures had been dfeveloped for measuring reading, the 
next .step in Stage On* of the research program was to investigate two 

^procedures for 'writing objectives, *" Short-term objectives (STOs*) are 
based on jthe' long-range goals, which are- developed using a formula and 
the student's scores frpm the reading-aloud measure, STOs can. .be 
written so that measurement. is on a standard task (e.g,, readincf aloud 

'at a specific level of a reading 'series) or measurement can be based 
on a standard criterion appl ied to sequential tasks (e,g., mastery of 
units in a basal reader). A survey of teachers who had used both 
procedures for one school year revealed that most teachers preferred 
measuring progress in reading through sequential tasks (Fuchs, Wesson, 
Tindal, Mirkin, & Deno, 1982).' ■ (. * 

At the same time, several studies were conducted to examine 
various- procedures for using the # data generated from the 
administration of the generic measures. Analyses of . student 
performance data indicated that students showed greater academic 
growth when a data utilization strategy was in effect 4 than when 
teachers did not use "the data systematically (Martin, 1980; Mirkin & 
Deno, 1979; Mirkin, Deno, Tindal, & Kuehnle, 1980), Questionnaires 

; 

designed to evaluate teacher satisfaction with two alternative data- 
util ization strategies revealed that teachers preferred to use a * 



combination of the two strateqies over using either strategy alone' 
(Fuchs et al.,,1982). This finding contributed to the desigirof the 
data-utilization strategy employed in* Stage Three studies. This 
strategy is described in the procedures section of this paper. 
Stage Two ' 

Stage Two consisted of improving the logistical feasibility 
(Lovitt, ( 1977) of the formative evaluation system, as measured by 
teacher -^friciehcy and satisfaction: No system of formative 
evaluation would <be useful if teacheV^ found it to be too time 
consuming or if they were dissatisfied with other aspects of the 
system, . Without efficiency and teacher, acceptance, the formative 
evaluation system probably would not be used regardless of its value 
in monitoring stu^ot progress. 

A series of field tests ' was^ conducted with a cooperating school 
district. The results indicated that with practice and systematic 
attempts to reduce measurement time, teachers were "able to increase 
their efficiency 15 times over. At the end of the study teachers 
required on the average only. two minutes to 'prepare for measurement, 
conduct a qne-minute assessment, and score and graph the' results 
(Fuchs, Wesson, Tindal, Mirkin, & Deno, ^981 ) . \ These teachers were 
also highly satisfied with the evaluation procedures. . When. questioned 
by independent evaluators the teachers' statecL.that: (a) the system 
eliminated much of the jargon, ambigurty, and vague descriptions c^ce , 
found in IEPS; (b) the system met the real intent of the law; (c) 
their own testing was now relevant to the instruction being provided 
in the classroom; (d) they were confident in the reliability of their 
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test, njaMhj^ecisions easier and meetings" "shorter; (e) their testing 
wa^more meaningful because a •student is compared with peers from 
his/her -own school and grade level; (f) tf>e students were more aware 
of their own progress because of the frequent charting required by the 
data-based system; (g) their ability to measure the effectiveness of 
their teaching strategies with any particular student was improved; 
and (h) the system made writing IEPs ,much easier (Wesson, Oeno, & 
Mirkin, 1982 ) • These results clearly suggest that this monitoring 
system not only'Ms logistical ly feasible, but, in fact, has practical 
advantages. , 
Stage .Three 

Stage Three of this research cind development plan brings the 
focus of research back to the primary goal: to determine the effects, 
of teachers' use of formative evaluation on student achievement. 

This paper i§' a report of the relationships among the degree of 
implementation of the formative evaluation system, the amount of 

structure' in the "student's instructional* program, and the student's 

t 

rate of 1 academic progress that were obtained- during a one-year 

training and implementation of the formative evaluation system. (See 

Eigure 2.) The hypothesis tested was that the extent to which 

teachers implement the evaluation system influences the degree td 

which their teaching is structured, and that structure, in turn, 

influences the extent/to which students demonstrate academic ffrogresS. 

Therefore, ^the following research questions were addressed: V 

(1) How well do teachers implement this formative evaluation 
system given the brief training that was provided? 



(2) Is there any relationship between the extent to which 

the evaluation system was implemented and the degree oW 
structure of the students 1 instructional programs? 

• * * - * 

(3*) Is there any relationship between the extent to which 

the s evaluation system was implemented and the -anlount of 

student 'achievement? . 

(4) Is there any relationship between the degree of 

structure of the students 1 instructional programs and 
the amount of student achievement? 



Insert Figure 2 about here > 



Method 

Subjects 



A total of 31 teachers ^participated in this study. In this 
group, there were .26 females and 5 males! On the average^ they had 
1.9 years of experience teaching regular education and 8.8 years 
teaching special education. The greatest percentage of "teachers (39%) 
had no experience* teaching regular education; 23% had taught special 
educat.von for one to three years. . 

There were 117 students included in the study. Their ages ranged 
from 6 to 13 years, with an average age of 9.5. There were 92 males 



and 23 females (the sex of two subjects was uricoded) in grades 1-7. 

K 

The greatest numbers of students were in grades 2-5 (20, 26, 25, and 
25, respectively). In grade 1, there were five students, in grade 6, 
there were 'nine students, and in grade ^7, there were only two 
students^ The students included in the studywere, for the most part 
(111 of v the 117), provided with special educ-ation in resource rooms. 



Measures 

. Three major types of measures were, employed in this study. 
First, the measure of the degree of implementation of the monitoring 
system was included 'since it was critical to know how accurate and 
complete teachers - are in using the evaluation system. Second, 
measures, indicating the degree of structure* of the students 1 
instructional programs were included. These measures are useful in 
determining how the evaluation system influences teaching practices. 
The third set of measures were student achievement indices. Most of 
these measures were administered three times during 'the five-month 
study. 

Implementation variables . The Accuracy of Implementation Rating 

Scale (AIRS) is an instrument that was developed in conjunction with 

the manual Procedures to Develop and .Monitor Progress on J£P Goals 

(Mirkin et al., 1981), which was used for teacher training in this 

study. The AIRS' is desicjfied to provide a format by which to monitor 

the implement at i or> of the procedures described in the manual. The 

AIRS consists of 12 items rated on a 1 to 5 scale, 1 being the lowest 

implementation score and 5 being complete and accurate implementation. 

A complete list of the items and their operational definitions can be 

' * 

found in Appendix A. 

Items \ and 2 of the AIRS, which require direct observation, deal 
with the accuracy of administration of the measurement *and 'selection 
of t the stimulus materials. Items 3-12 of the AIRS require inspection 
of various written documents. Specifically, the . rater examines the 
following documents for each student: (a) the Individualized 
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Educational .Plan (IEP), which should specify the long-range goal and 
> short-term objective in reading; (b) the reading graph; (c) the 
instructional plan for reading; and (d) the record of changes made in 
the instructional plan in reading. Factors included in items 3-12 
pertain to accuracy of establishing:" (a r ) the 'appropriate measurement 
level; (b) an adequate basel1ne,.(c) .an^accurate long-range goal and 
short-teip objective, (d) ' ^detailed graph, (e) a complete 
instructional .program, and (f) the aimline. These items also focus on 
the ;'timrng of instructional changes as well as the types of changes, 
made. V f (See Appendix A.) The AIRS was used to assess the degree of 
implementation at the beginning, mid-way, and at f the conclusion of 
this study. 

i 

The inter judge agreement -for the AIRS ranged from .92 to .98 when 
percentage Of agreement was based on a within one point' rating match. 
The percentage of exact agreement ranged from .73 to .91. 

Structure variables . The Structure of Instruction Rating Scale 
(SIRS) was designed to measure the degree of. structure of the 
instructional-lesson that a student received. In this study, the 
focus was on structure during reading _ instruction. The variables 
chosen, for inclusion on the SIRS were gathered from current literature 
.on instruction and student academic, achievement (cf. Stevens & 
Rosenshine", 1981); A list of the variables and their 'operational 
definition! can be found yr Appendix B. Observations were conducted 
at three different points in time during the study.' 

The .SIRS consists of 12. five-point bipolar rafing scales. A 
rating of 1 is low for the variable and 5 is high. Observers., trained 



by videotape to a criterion of .80-. 90 inter-rater .agreement, rate all 
variables on the basis of strict definitions at the end of a 20-minute 
observation' period. For the present study, nine research assistants^ 
were trained, as observers; they reached an inter-rater agreement level 

0 

of .92 before actually observing in classrooms. The focus of each 
observation period for the SIRS is on the instructional environment 
for one student at a time. (See Append** B.) „ 

The^ reliability of thf SIRS was assessed .by means of Coefficient 
Alpha, a -measure of internal consistency. For a sampJe of 70 students 
observed in November 1981, the average intSr-ftem correlation was .37, 
resulting in an alpha* of .86. Thus, the SIRS seems to have a high 
degree of reliability as indexed by a homogeneity measure. 

Achievement measures. At three different points in time during 
the study, three one-minute oral reading measures, consisting of 
randomly selected passages from the third grade _J,evel in Ginn 720, 
were administered to the students. Thes'e measures were selected 'based 
on their technical adequacy (Deno et al., 1982) and sensitivity ,to 

0 

change (Marston, Lowry, Deno, & Mirkin, 1981). These simple measures 
are as reliable and valid as traditional standardized tests and yet 
are more likely to reflect small increments of improvement. ' The 
measurements were conducted by directing students to begin reading at 
the top of the page and continue reading for one minute, at which time' 
the examiner would say stop. If they came to a word they did not. 
know, the examiner would supply the'.word and prompt them to continue. 
While the student was reading, the examiner followed along on a copy 
of the passage* and . marked errors of Institution and omission. 

* 

'•16 



Following the reading, the numbers of words read correct and incorrect 
were counted and recorded, with* no feedback given to the student. 
These three reading measures were. given at the beginning of the stud& 
(pretest), in -the middle, and immediately following the final 
observation (posttest). ' 

Two subtests from the Stanford Diagnostic Reading Test (Karl sen, 
Madden & Gardner, 1976) also 5, were given as posttest measures'. The 
Structural .'Analysis and" Reading Comprehension subtests were 
administered along with the final reading passage- measures. Each of 
these -subtests has two parts, with Structural Analysris focusing on 
syllabication (blending and division) and Reading Comprehension 
focusing on answering both literal and inferential questions for 
previously read passages.* 

Procedures * * . v 

• < 

All teachers were trained to carry .out a specific set of 
procedures, including establishing jfo appropriate measurement level,' 

'writing long-range goals (LRGs) and short-term objectives (STOs), 
collecting three oral reading scores per week for each student, 

^plotting the scores on a graph, and using Uhe data in making decisions 
about the effectiveness of students' instructional programs. 

Measurement . Reading measurement consisted of one-minute timed 
samples of reading from the student's curriculum. Both words correct 
and incorrect were scored and cKarted on equal interval charts. The 
level of stimulus material for testing, which also became the 
baseline, was selected as the level from which the student could read 
aloud between 20-29 words per minute for grades 1 and 2, and 30-39' 
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words per minute for grades 3-6, 

Writing goals .- Teachers were instructed to write long-range 
goa^s for the student's IEP using both the entry level criterion and a 
desired year-end mastery criterion, usually 70 words correct per 
minute with* no more than 7 errors. The format used in writing the 
long range goal is s^own in Figure 3*. 



Insert figure 3 about here 

♦ 

Writing objecti ves . Two- types of short-term objectives were 
written, performance and mastery; both were based on the long-range 
goals. For performance objectives, in order to compute the short-term 
objective, teachers first subtracted the bgsel^lje level -of performance^ 
from the criterion level listed in the LRG. Dividing this difference 
by the number of weeks necessary until the annual review, they arrived 
at the number of words per week gain necessary to meet the long-range 
goal' criteria. In performance measurement, the measurement task is a 
random sample of -items from a constant set of stimuli, and the goaf is 
to improve the Vevel of performance on that stimulus material. In 
graphing 1 " performante measurement, the horizontal -axis represents 
successive school days and the vertical axis represents the level of 
performances a constant measurement task; each data point represents 
the level of proficiency on that constant measurement task. The line 
of best fit through the data points depicts the student's rate of 
improvement in performance on the set of stimulus material. 

When ^writing mastery based short-term objectives, teachers 



14 

backtrack through the reading curriculum to find the level at wffich 
the student} reads at the mastery 4 rate designated In the long-range 
goal. The pages or stories between this baseline level and the goal 
level are counted and divided by the number of -weeks until the annual 
reviewJ This namber bedomes the criterion used in the STO specifying 
the average weekly progress necessary to meet the LRG^ On the graph, 
ths horizontal axis again represents school days ,and the vertical axis 
represents successive segments, pages, or stories of the curriculum 
mastered. .Each data point represents the number' of curriculum 
segments mastered through a given dayC^ The line of best fit through 
the data points depicts the rate of student progress through the 
curriculum. .THe purpose of ^repeated mastery assessment is to assess 
the studentTw^ate of mastery in the curriculum, and the purpose of 
the graph is to display that rate of Curriculum mastery. The teacher 

measures the student on a representative sample of material* from the 

f 

current instructional curriculum unit and plots that 'level 'on the 

graph until mastery is achieved* At that point (a) the teacher 

regfsters 'on the student's graph that; a curriculum unit has been 

mastered, and (b) the set of reading stimulus material on which the 

teacher measures the student progresses to the next segment in the 

hierarchy. The-t^o formats used for writing short-term objectives are 
f *\ 

1 iSted in Figure *4\ _ 



;eft Figure 4 about here;? v * " 



Data utilization . In addition to measuring and writing goals and 
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objectives, the teachers were trained in the use of the measurement 

procedures for evaluation of the instructional program. In order to 

monitor student growth, the baseline reading level* and the long-range 

goal were connected by an aimline that showed- the students' desired 

progress! Every seven cfata points, the teachers were to monitor 

student growth by means of the) split-middle or quarter-intersect 

method (White & Haring, 1976). An example is' given in Figur^ 5, If 

the^ student was. progressing at a rate equivalent' to or, greater than 

that indicated by the aimline, the instructional program was 

continued; if the projected rate of growth was less than that 

inditated by the a\jmline, teachers were directed to make a major 

change in the 1 student's instructional program. 
> 

, Insert Figure 5 about here 



Teacher Training ^ < 

Three formats were used to train teachers in these procedures. 
For 10 teachers^ in one special education cooperative, training in the 
use' c*f the measurement procedures took 'place in a series of three 
half-day workshops at the beginning of the school year. Teachers also 
were provided -with the manual, Procedures to Develop and Monitor 
Progress on IEP Goals (Mirkin et al., 1981),; which \^tai led all the 
aktrvities teachers were, to do. In addition, visits by observers in 
December, February* and May,' and frequent phone contacts, prodded 
feedback to the teachers on tbfc accuracy of their implementation of 
the measures. * , 
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In two other districts, training Vas conducted by district 
personnel with the aid of the same manual. In NovewkQr, three people 
designated by each district as trainers participated in a one-day 
trainer's workshop. At this time the procedures were reviewed for the 
trainers and they were given trainer's manuals that specified 
activities for. them to use when teaching the monitoring procedures to 
the teachers. After this trainer's workshop, the "trainers set up and 
conducted a series of training sessions^ in their own districts. 
Questions about the procedures usually were forwarded to IRLD 
personnel. Oq-going phone conta£>-facilitated the training process. 

The last type of teacher training involved 10 teachers from a 
rural special education cooperative that had served as a pi lot site. 
These teachers were ' trainejj/dur jng one week of ful 1-day' workshops 
prior to the 1980-81 school year and during monthly, half-day 
workshops throughout the year. These workshops were, conducted by IRLD 

r 

staff and, prior to February, their focus was on training the teachers 
^to (a.) write curriculum-based IEPs, (b) create a curriculum-based 
measurement procedure including mastery and performance systems, (c) 
measure frequently and graph student progress toward IEP goals, and 
(d) develop strategies to improve the feasibility of implementing the 
frequent measurement systems. By February, each. teacher had developed 
curriculum-based IEPs* for at least % two students and was measuring and 
graphing those students 1 reading performance at least three times each 
week. In, February, the data-utilization systems were introduced to 
^the teachers. The remainder of the workshops consisted of t teacher 
presentations of their graphs and concussions of student progress and 
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cftanges in instructional plans. 
Data Collection 

Throughout the year, specific data , were* compvled by each teacheV 
and sent to an IRLD staff member who was designated as the contact 
person. Data collection took place on three occasions, separated by 
approximately two months each, and was synchronized with the SIRS and 
AIRS observations. - - 

, Each ■ teacher compiled' a- packet .for each student in the study 
consisting of the following forms: -(a'fSIRS; (b) AIRS; (c) Graph; (d) 
IEP (IRLD form); (e) Instructional Plan (IRLD form); (f) Changes in" 
Instructional Plan (IRLD form); (g) Student Information Sheet; and (h) 
-d^Grade Passage Scores. . 

To insure confidentiality, each student was assigned an ID number 
and names were removed before *the documents left *the district.. The 
information obtained from the teachers was gleaned by research 
assistants according to the implementation, structure, and achievement 

V: 

variables. On the last round of data -cbllection, teachers were sent 



the Stanford Diagnostic Reading Tests alongp/ith the standard set of 
forms. ^ 
^Observer Training , ' a 

t In order to collect SIRS data and rate items lv^nd 2 on the. AIRS, 
observations of each student during reading class were necessary. 
Staff members (lead teachers, program coordinators) from two locations 
involved . in. the research carried out the necessary observation 
procedures in their districts. These observers were trairftd during, 
one h-alf-day session by two IRLD staff members. A brief review of the 
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research design Was provided at the onset of Restraining.* The 
primary ..focus of the training was on actual observation procedures 
required of the observers throughout the year, particularly proper use 
of the Structure of Instruction, Rating Scale (SIRS) and the Accuracy 
'of Implementation Rating Scale (AIRS). • """""""" 
. Explanation of the SIRS included its history and rationale, its 
purpose, ;and its , administration procedures. Each item on the scale 
was discussed in detail, including definitions for and examples of 
several ratings per item., After the 6IR-S was explained, two 
videotapes were us*ed as a training aid to give the observers a chance 

v 

to practice their skills. The tapes consisted o)r two resource room 

situations, one demonstrating ^ model teacher and the other more 

indicative of a teacher who would receive lower ratings on the SIRS. 

Each item on' both tapes was rated by each observer and an IRLD staff 

member and discussed. An inter-rater agreement of ,80 was required of 

the observers before the session ended, 
♦ j+ 

The AIRS training consisted of explanations of the two items on 
the scale that the observers would- be rating. The final portion of 
the training involved the organizational aspect of the data 
collection, A list of documents that were to be collected at the time 
of- each observation was drawn up and explained. Throughout the year,- 
an IRLD staff member was in contact with, the observers on a weekly 
basis to insure understanding and consistency of the procedures and to 
answer any questions. 

In the other two study sites, trained IRLD staff members 
conducted the observations. Nine observers were used iTTorTB^istrict 
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and four in the cither. Training of these observers was similar to the 
training of the district personnel. 'The videotape and code book were 
presented and ratings were practiced until the interof>server agreement 
criterion was reached. 

Results 

The data reported for this study are correlational , limiting aijy 
interpretation to statements about the*direction of relationships. 
However, causal modeling tetyniques f>rovide a method for going beyond 
descriptions of correlations to making inferences about the ]$>gtc of 
directional hypotheses.* These findings still cannot be used to prove 
causality, but,' if theoretically justifiable sequencing of variables 
is possible, can test the plausability of a particular causal model. 

The causal modeling analysis is basically a data reduction 

V 

technique that uses flexible confirmatory factor analysis techniques 

I \ 
to display plausible patterns of causal relationships between 

variables. This approach is called "maximum-likelihood analysis of 

structural equations' 1 fylLASE). Anafysis is facilitated via a computer 

program, Linear Structural Relations (LISREL), which "simultaneously 

estimates relation^ hetween observed ' measures and underlying 

dimensions, relations among- the underlyihg dimensions, and residual 

variances for dependent underlying dimensions" (Maruyama, Rubin, & 

Kingsbury, 1981, p. 966). / 

. The constructs of interest \jn the present causal model analysis 

were: (a) th<T implementation of ,a data-based program modification 

system for readuty performance (Implementation); (b) structure of 

instructional programs for specific students (Structure); and (c) 



reading achievement (Achievement). , These 1 constructs are,.part of thre 
theoretical model displayed in Figure 6* and are described "in- the 
methods section under Measures. (See> Figure 6.) .In this mode], which 
is longitudinal in design, each construct is viewed as being caused -by 
the concurrent constructs and the constructs that temporally preceded 
it. Within time^.periods, patterns of influence are hypothesized to go 
from Implementation to Structure, Implementation to Achievement, . and 
Structure to Achievement.- In other words, Structure 3 is a result of 
Implementation "2 and 3, Structure 2, and Achievement 2.;. Finally, at 
Time 3, the scores from the Stanford Diagnostic Reading Test do not 
cause any other constructs but are caused by all Time 2 and 3 
constructs. 

Insert Figure 6 about here * v 

In ( order to analyze data using a causal mod.el ing u approach, 
several methodological limitations must be. kept vn mind. 'First, if 
every variable of interest is included in the causal rpodel* the model 
will be far too complex to analyze, for there will be more relations 
than can be specified accurately. Thus, i^ is 'important to restrict 
the model to those variables that the researchers view as "crucial. 
Leaving out some potential contributors to the model may lead to an' 
incorrect interpretation of the phenomerton of interest. /'Second, only 

Hi 

those variables that demonstrate reliability should be included' in a 
causal model since unreliable variables ma^ lead to faulty' inferences. 
Therefore, variables that may be most interesting to researchers must 



be left out of ±he model if they cannot be measured reliably. As a 
result, important information may not be accessible using a causal 

modeling approach, \ M 

t < * • 

Given these limitations, this approach has two components: (a) 
the path* analysis, which includes all the. variables o^ interest but 
sacrifices . reliability; and (b) the MLASE analyses, which improve 
reliability - but sacrifice some of the critical variables. Each of 
these components is described later. Prior to these analyses, , factor 
analyses were conducted on the AIRS and SIFTS variables in order to 
establish which variables fit into separate factors. 

# factor analysis of the items of the AIRS revealed that 6 of the 

12 represented one factor. These six included the items referring to 

* *» 

baseline, aimTine, instructional level, graph set-up, short-term 
objective, and long-range goal . Thefse items involve start-up 

s 

activities that teachers must do in order to begin using the 
monitoring system; thus, they logically fit into one factor 
representing measurement. This factor was used in the MLASE analysis.' 
One consequence of this factor analysis was that many of the variables 
that are crucial to full implementation of data-based program 
modification were left out of the MLASE analysis. Specifically, the 
items aimed at assessing implementation of the procedures teachers use 

V 

» r ~ m 

to evaluate student progress and then change the student's program 
(Timing of Instructional Changes, Clear Changes, Substantial Changes} 
were not included in the analysis because MLASE analyses require 
reliability of the variables used. Timing of Instructional Changes 
was added as a construct in the path analysis since it seemed to be 
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the most critical variable left out' of the factor. • Because usjng / 
multiple measures to assess reliability of the variables used is so 
important for the MLASE analyses, none of these Uems was used in this * 
analysis, • . 

Factor analysis of the 12 variables on the SIRS revealed that 9 * 
of the 12 defined a single factor that was called Structure. The 
three variables that did not load on that factor were Independent 
Practice, Positive Consequences, and Silent . Practice on Outcome 
Behavior. Only the njne variables, that defined" a single factor were 
utilized in the MLASE analyses in this study. Hpwever, Silent 
Practice and Positive Consequences were included in the path analysis 
because additional vari v ab!6s could be added lo this analysis and these 

two variables seemed to be the most critical variables that were not 

* *> 

included An the factor. • 

. > , WF* - 

Path Analysis " . * * 

The results of th$ path analysis are shown in Figure 7. The beta 
weights' for the significant paths are given in the figure. Note that 
♦the^Implementation Construct was renamed as Measurement to highTight 
,the fact that total implementation could not ' be -analyzed in so far as 
the evaluation itente did nojt load' on the factor. The significant 
paths includethe paths from 'Time 1 to Time 2 .and Time 2 to Time 3 for 
Measurement, Structure, Achievement, and Posi tive\consequences. Other 
significant paths include Measurement to Structure and Measurement to* 
Achievement (p < < 10) at Time 2. Also, the ,paths from Timing of 
Instructional Changes to Structure and to* Positive Consequences at 
Time 2 were significant. Significant paths at Time 3 include Timing 
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of Instructional Changes to Structure and to Silent Practice. The 
path between Silent Practice and Achievement was also significant (p < 
.08). Finally, Achievement 2 "and 3 were related significantly to the 
Stanford Diagnostic Reading Test . * / • 



^ i Insert Figure 7 about here 



MLASE 

As was stated earlier, only the' main constructs of the'model were 

used fpr the MLASE. Those constructs were Measurement, Structure, and 

Achievement. In order^ tq retain only those factors with demonstrated 

* .• 
reliability, several 'indices of implementation and structure were 

dropped from the analysis. Since thj| items dropped were primarily 
evaluation Hems, the implementation construct 'is defined by 
measurement items. For this reason* that construct was renamed 
"Measurement. 11 For Measurement, the factor analysis of data on 12 
variables revealed that six of these constituted a single factor. The 
factor loadings ranged from .40 to .93 (see Table 1). These six r 
variables' included the J following items from -the AIRS: (a) 
Instructional Level (Item 3); (b) Baseline (Item 4); (c) Graph Set-Up 
{Item 5); (d) Aimline (Item*" 6); (e) Long-Range Goal (Item 8); and (f) 
Short-Ternv Objective (Item 9). Two other variables, Substantial 
Changes and Clear Changes, Joadeti on this factor but did not •fit- 
conceptually with the other six. The other six variables pertained to 
the initial set-up of the . measurement system whereas the changes 
variables referred to modification of the instruct ibnal program. 



Therefore, these two change variables, which also loaded on Factor 3, 
were not included in Factor 1. At a later time, Baseline also was 
dropped as a>,y.ariab1e because teachers in one of the sites had not 
been trained to record 'baseline data on the graphs that were collected 
for use with the AIRS. This decision was, made to avoid significantly 
lowering-the number* of cases available for analysis. In, sum, the 

1 , i 

variables that were used as indicators of measurement were Aimline, 
Graph Set-Up, Instructional Level, Long-Range Goal, and Short-Term 
Objective 4 : Because .'MLASE analyses are most, effective when there are 
three ,or more indicators of each factor, the five variables were 
randomly assigned to be included in one of three . indicators as 
follows: (a) Instructional Level arid Aimline; (b) Graph Set-Up; and 
(c) Long-Range Goal and Short-Term Objective 4 ; 



Insert Table 1 about here 



A factor analysis also was conducted for the Structure construct. 
A factor "analysis of the 12 items from the SIRS revealed that nine of 
the variables constituted one factor and three items were not part of 
this factor (see T-able 2). . The three excluded items were Independent 
Practice, Positive Consequences, and Silent Practice on Outcome 
Behavior,. The remaining nine variables were divided randomly into 
three indicators. The three indicator sets were: (a) Instructional 
Grouping, Teacher-Directed Learning, and Corrections; (b) Active 
Academic Responding, Frequency of Correct Answers, and Pacing; and (c) 
Demonstration/Prompting, .Controlled Practice, and Oral Practice on 
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Outcome Behavior, 



Insert Table 2 about here ^ 

r 

For the Achievement construct, the factor analysis revealed that 
words read correctly and errors for the three passages loaded together 
on one factor (see .Table 3). Because the error scores basically 
-mirrored the words read correctly scores, collinearity problems 
resulted from analyses including both types of scores. Therefore only 
the words read correctly scores were used as^ the indicators of the 
Achievement construct. 



^ Insert Table' 3 about here 
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The next step, in- preparing for the causal model analysis was to 

construct a correlation matrix that included the three indicators of 

v * 

each construct (Measurement, Structure, and Achievement) at the three 
data collection points, plus four end-of-the-year scores from the 
Stanford Diagnostic Reading Test (Word Blending, Word Division, # 
Literal Comprehension, and Inferential Comprehension). This 31 X 31 
matrix was' used ' to estimate reliability* and consistency of the 
indicators of the constructs. " The indicators for Structure and 
Achievement were reliable and stable. The Measurement indicators were 
>1ess reliable and stable but stjill considered useful , for further 
^'analyses. ' 

MLASE analyses Were used to estimate the parameter of the model.. 
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The matrix analyzed for the structural equatipn analysis is a 
covariance matrix. Because the data are longitudinal, relationships 
between variables must allow for changes ov^r time in the variance of 
the variables (e.g., Maruyama et al. f 1981). Analysis of standardized 
♦correlation matrices would not be appropriate, since they restrict all 
measures to unit variance and thereby do not allow changes in 
variability over time. • ✓ 

The model will be explained in two parts, a measurement model and 
a structural model. The measurement model contains the estimated 
relations (loadings) between the observed- variables and^their 
constructs, the residual variances for observed variables, and the 
covariances between pairs of residuals for the observed variables. As 
can be seen" from Table 4, all paths were significant as were most of 
the residual covariances (see Table 4). 

i s 

s 

Insert Table 4 about here* 

* * --„„„„..„.1.„„ ( * 

, The structural' model contains the estimated relations among the 
unobserved variables namely, the paths among the constructs of 
interest. The significant paths that form the structural model are 
fynd in Figure 8. (See Figure 8.) All three constructs were. very 
stable across time. Measurement Time 2 is caused* by Measurement Time 
1, arid Measurement Time 3 is caused by Measurement Time 2. Because of 
the high 'stability of measurement over time, a couple of paths were 
dropped from the model since including them ^caused * problems of 
col linearity. The paths that were dropped include. Measurement 2 to 
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Structure 3 and to Achievement 3. Similar relationships existed for 
Structure and Achievement. Other significant concurrent causal 
relationships included Measurement to Achievement at Times 2 and 3. 
For* Time 2, this relationship was positive and for Time 3, it was 
negative. Also, at Time 3, > Measurement was related to Structure* and 
Achievement 2 and 3 were related to the Stanford Diagnostic Reading 
Test scores. 

Discussion 

Many" findings are consistent between the path analysis and tffLASE 
analysis. A noteworthy finding is the stability of the three 
constructs across 4 time. , As in the path analysis, the causal mpdel 
analysis indicates that Achievement, Measurement, and Structure are 
difficult to impact. ^ 

These results are consistent with previous f indingsfthat student 
achievement is very stable over time (e.g.,' Bloom, 1964, Maruyama et 
al., 1981; Bloom, 1964; McGarvey, 1978). Maruyama et al., (1981) 
examined the relationships among achievement, self-esteem, social 
class, and ability, using a sample of 715 children aged 9-15. They 
found that achievement was very stable* They noted that "not even a 
variable* "such as ability ' seems to exert any incremental influence on 
achievement" (p. 972). The students in the present study fell within 
the same age range and stability of achievement was equally as 
evident. This finding was discouraging, as our hope was to make some 
impact on achievement. 

Measurement is .also stable, which indicates that teachers who 
initially learn to implement the measurement procedure accurately 
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continue to do so, while for teachers who are initially less skillful 
in measurement, practice is not sufficient to improve their 
measurement skills. Clearly, teachers who do, not' implement 
measurement procedures to criterion as a result of initial training 
must.be targeted for more intensive training if fujl implementation of 
the system is desirable. • 

The Structure construct displays the same degree of stability as 
that obtained for Achievement and Measurement, Apparently,' if a • 
teacher designs a highly structured program for a student, that 
student continues to receive highly, structured instruction throughout 
the school year. In contrast, students whose instruction is less ^ 
structured initially also continue to receive less structured 
education throughout the year. Since the hypothesis contained in the 

4 

causal model is 'that implementation of the evaluation system will 
increase structure, the hypothesis is not supported by th?^?indings. 
However, the failure to measure the evaluation and change components 
of the system renders the test of this hypothesis inadequate. 

Also common to both analyses is the relationship between 
Measurement and Achievement at the middle of the study. As others 
have shown (Jenkins, Mayhall, Peshka*, & Townsend, 1974), measuring 
student performance can result in increased performance. Thus, while 
measurement alone "is not intended as a sufficient condition for 
affecting student performance in the mode], measurement alone does 
seem- to operate directly on achievement. Since the relationship 
between Achievement as measured by reading aloud frtrni a basal text and 
Achievement as measured by the Stanford Diagnostic Reading Test is 
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strong and. positive, the operation of measurement alone must be 
considered a dependent variable that will affect* reading proficiency 
in general*. The strong positive relationship between reading aloud 

V * 

and general reading achieveoient is consistent with, past results (Deno 
et al., 1982), which established the validity of, reading aloud as, a 
measure of reading proficiency. v 

Several important findings deserve to be • high! ighted. First, 
during the study Measurement had a strong effect on'two' other major 
constructs, Structure and .Achievement. This was ■ the expected 
relationship given the rationaTe for the data-based program 
modification procedures. Troublesome, however, was the .fact that 
these effects seem to be shortlived and were not manifest at>1:he end 
of the study as hypothesized. Perhaps, measuremeh^r-lTas a short term 
positive effect on Structure .and Achievement, but that as reactivity 
to measurement decreases, more Sophisticated procedures (such as- 
evaluation of student performance data) and adjustments'"'" in the 
instructional program need to be implemented if the potential benefits 
of measurement are to be realized.' * 

This hypothesis receives ^ome support from Ihe beta weight 
reported for Timing of Instructional Changes in the path analysis, • In 
the middle ofT^thB treatment period the extent to which teachers 
properly timed instructional changes (as indicated by the data) was 
negatively related *to Achievement. Thus, perhaps measurement 
activities are important early on in the implementation of data-based 
program modification, but the positive effects of measurement cannot 
^ be sustained unless evaluation procedures also are used. 
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The lack of an impact of Structure on Achievement is troublesome 
for the causal model. In the model, Structure is hypothesized to 
directly influence achievement. This lack of relationship may be 
because the SIRS does not validly . measure the structure vari#les 
affecting achievement that others have identified (Stevens & 
Rosenshine, 1981) . A more likely reason may be methodological . 
Although, the SIRS has established validity (Deno,* King, Skiba, 
Sevcik, & Wesson, 1982), the sampling procedure used in this study 
weakens its, utility for longitudinal research. Data collected on 
structure on three occasions for a total of 45-60 minutes of 
instruction over a five to seven month period may not be a good 
representation of the structure of instruction the student received on 
a tfaily basis over that time period. Evidence of sampling bias is 
suggested in comments made by teachers, who indicated "that instruction 
looked different on days observers were present, - 

Of special interest is the relationship between silent practice 
and achievement found in the path analysis. This relationship has 
been obtained y previously by(^Leinhardt, Zigmond, and Cooley (1980) and 
Thurlow, Gradln,- Greener, and Ysseldyke (1982). The consistency of 

h . 

this finding acfi^ss researchers provides a firm empirical base for the 
proposition thaV silent reading practice is an .activity that 
significantly ir^oves general reading proficiency. Sufficient 
evidence has been amassed to recommend to teachers that they plan for, 
and provide, increased amounts of si'lent reading practice for students 
as a part of their^daily reading program. Such a recommendation takes 
on increased importance when considered in light of the relatively 
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small proportion of time actually allocated to si ien\ ' reading by 
teachers, and the small amount of time .students actually engage in 
silent reading (Leinhardt et al., 1980; Graden et al.,'l982; Thurlow 
et al., 1982). ' ' ' 

With regard to the actual ratings received on the AIRS and SIRS, 

c 

there a$e several interesting points. Basically, teachers were 
adequately trained to' conduct measurement activities and to write 
goals and objectives. The mean ratings for most of the AIRS items 
were above 3.5 (5 being complete and accurate implementation). The 
items on which teachers received the lowest scores were' Instructional 
Level, Timing of Instructional Changes, and Substantial and Clear 
Changes. Basically, it appears that teachers were less successful in 

r 

mastering the parts of this system aimed at evaluating student 
progress; Not only were the mean ratings lower orr these items/ but 
many teachers made no instructional changes for many students. Tl]at 
is, the majority of students^ were instructed with their original plan 
for the entire duration of the itudy. , ^ 

Mean scores on the SIRS were ftore variable', ranging from 1.71 to 
4.36. The highest mean scores were for items concerning Teacher- 
directed Learning, Active Academic Responding, Frequency of Correct 
Answers, and Corpett'ion Procedures, The lowest mean scores were on 
Positive Consequences, Independent Practice/ and Silent Reading, 
Teachers used few positive reinforcers, other than 'praise, and'seldom 
provided feedback during independent practice. Also, relatively 
little time was dedicated to silent reading. 

While many of the present findings are interesting,* several 
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hypotheses were not tested, ^First, due Ao the early factor analysis 
conducted on the measure constructed tp scale implementation, several _ 
important variables were dropped ttm Xh\% analysis/ Thus, while the 
analysis tested the hypothesis that measuring student performance and 
goaul-setting affect structure and achievemerit, < it was not 'possible to 
determine whether s data utilization components would affect structure 

and achievement. In the general data-based program modification 

■ \ 

model, student performance data are to be 'charted and used to evaluate 
the effectiveness of instruction. If instruct i on is insufficient for 
* goal attainment, a change is to be made in the students instructional 
progcam. In such a system, measurement is .viewed as necessary but not 
sufficient to effect optimal student growth. That evaluation 
procedures will affect Structure and Achievement was supported by the 
path, analysis "finding that the fining, of instructional changes 
affected Sjlent Practice, which in turn affected achievement. In 
addition, it appears that Measurement may impact Structure. In the 
causal model, the impact of Measurement on Structure at JTime 3 is 
consistent with model hypotheses. The routine of measuring student 

progress over time apparently results in teachers increasing the 

■* 

structure of their lessons. If this is the case, then if the 
evaluation components were implemented completely they would probably 
yield an even stronger causal effect of the continuous evaluation 
system. , 

To summarize, the main Conclusions of this causal analysis are 
that measurement, structure, and achievement are stable across time 
and that measurement has a short-lived effect on achievement. In 
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addition, silent practice in reading seems to relate to achievement 
gains. Finally, the primary goal of determining the effect of the 
implementation of an evaluation system on structure of lessons and 
student achievement was not realized via the present analysis, 
fjopefully, further analyses will achieve this'goal. 
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Table 1 

• Factor Loadings for the Measurement Variables 



* ^ > 


Factor T 


Factor 2 


Factor 3 


Administration 


* 

- 04 




79 3 


Selecting Stimulus Material 


06 






v Instructional Level- 


• 69 1 


18 




Basel ine 




07 




: Graph Set-Up 


.40 1 


04 


34 


M 1 m 1 1 n e 


col 
.58 


-.13 


-.04 


Timing of Instructional Changes 


.08 


.94 2 


-.32 


■Long-Range Goal *, 


.47* < 


.31 


t 

.31 


Short-Term Objectives 


.44 1 


-\19 


.34 


Instructional Plan 


.07 


-.13 


.37 3 


^Substantial Changes ' 


.75 


v!2 


.43 3 


Clear Changes • 


.58 


.16 


.38 3 



2ltems which load on Factor 1. 
^Items which load on Factor 2, 
Items which load on Factor 3. 

Note: Substantial Changes and Clear Changes were seen as Factor 3 since 
Factor l^lncluded items pertaining to the set up of'the measure- 
ment system; both change items are pertinent to using'the data 
in an evaluative manner. 



- Table 2 

Factor Loadings for the Structure Variables 



Factor 1 



Instructional Grouping 


.45 


Teacher-Directed Learning 


.66 


'Active Academic Responding 


.59 


Demonstration/Prompting 


.66 


Control led Practice 


.70 


Frequency of Correct Answers 


.36 


Independent Practice 


.20 


Corrections 


.55 


•* 

Positive Consequences 


.16 


Pacing 


.64 


Oral Practiced Outcome Behavior 


.52 


Silent Practice on Outcome Behavior 


;; -.12 


^Variables which Toad on Factor 1. 
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Table 3 

Factor Loadings for the Achievement Variables 



Factor 1 



^Passage 1 - Correct 


.79 1 




Passage 2 - Correct 


.78 1 




Passage 3 - Correct 






Passage 1 - Error 


-.75 1 




Passage 2 - Error 


-.76 1 




Passage 3 - Error 


-.73 1 




1 

Variables loading on Factor 1. 
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Table/ 4 



Relationships Among Variables Shown in Figure 6 
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Loadings 



Measurement Model 
Residuals Cov^riance of Residuals 



A > ; 


yj9- ' 


.a . 


.65 




ai 


..353* 


B ' 


.77 


b 


.83 


> 


bj 


,415* 


C 


.60 


c 


.13 




br . 


.503* 


D 


.91 


d 


.13 




cr 


.124 


c 

u 


.87 


e 


.26 




cs 


: .175* 


F 


.84 


f 


.34 




ia 


.280* 


G 


.95 


g 


'.04 




jr 


.391* 


H 


.93 


h 


.15 




ke 


.288* 


I 


.78 




.40 


4 


yz 


.087 


J 


.73 


j 


.53 




aabb 


.194* 


>K 


.71 


k 






em 


.080* 


L 


.81 


1 


.20 / 




• aq 


.329* 


V 


.70 


m 


.26*- 


' 


* 




n J 


.79 


n 


.56 








0 


.92 


0 


.07 






• 




.90 


. P 


.22 










.68 


q 


.36 








R 


.60 ^ 


r 


.85 








S 


. 60 




. oy 


V 






T 


.77 


( t 


.10 








U 


.74 


>■ u 


.38 








V 


1 .00 


< v 


.40 








W 


.95 


w 


.13 








X 


.81 « 


X 


1.16 








Y 


.90 


y 


• .55 








Z 


.74 


i 


.32 








AA 


.86 


aa 


.42 








BB 


.77 


bb 


.26 
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Figure 1. Special Education Decision Makj.ng Processes 
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Figure 2. Causal Model Research Design^Hypothesis 







V 


• 




i 


• 


44 


) 

4 




♦ J 


' i 






4 

u 

Condition ; 


Behavior 


Criteria 


* 
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In weeks , when 

(total if weeks) 
presented with stories from 
Level , 

(It) (reading series), 


student will 
read aloud 


*at thre rate of 5Q 
wpm yor better 
5 or fewer* errors • f 
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Figure 


3. Format for Long-Range Goal: Reading 
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CONDITION *' 


BEHAVIOR 


CRITERIA 


Farh ^urrp^^ i wppIc> whf*n 


student wi 1 1 


at an averaae increase 


presented wHh a random 


read aloud 


Of 


selection from 




(70 or 50 wpm - actual 


(level H from current 




performance) total # 


instructional level - same 




weeks remaining in / 


as LR6) 




school year. 


of 






(reading Series) » 
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CONDITION 



BEHAVIOR 



CRITERIA 



Each week, when presented 
with successive stories 

from 

(Level ts from current 
instructional 1 evel to 
annual goal level) 



student will 
progress 



at the rate of 

stories per week maintain- . 
ing the mastery criteria 
of at least 50 wpm (gr. 1 

6 2) with 5 or fewer errors 
and 70 wpm (gr. 3-6) with 

7 or fewer* errors 



Figure 4* Performance and Progress' Charting Short Term Objectives for 
_Jteading. 
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V»oretlc»1 Model depicting Interrelationships among Implementation (IMP), structure of Instruction (STP.C1 

K bS H Ve ? mNSureS »■:« '•'Presented by rectangles and the constructs underlying the measures are 
represented by circles. Causal 'paths-are Illustrated by straight arrows while relationship 1n whl^n cauLl 
re a' \\Z J£ 12 rh°. T are , Sh0wn b * curved ' "Puble-hwded arrows. The capM? leu r'ep'ese ^ ' 
relations between the observed measures and their corresponding constructs; small letters represent th> 
variances the residuals of the observed measures. The path? between thi variables are numbered ' 
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7. Path Analysis is depicting the relationships between measurement (MSMT), structure (STRC), 
achievement (SCH), timing of instructional change (TIC), silent practice (SP), oral practice 
(OP), positive consequences (PC), and the Stanford Diagnostic Reading Test (SDRT). The 
curved double-headed arrows represent initial relationships and the straight arrows represent 
causal relationships. 



-.357* 



er|c 




*p < .05 



,793* 



64 V 



.955' 




,479 
(p<J0) 



Figure 8. Structural model depicting the relationships between mea^lf^jUlkft'" (TIST ) V-attug 

achievement (ACH). and the StanfoYd Diagnostic, Reading TestTSW)/' Ih^^^ , ^|{^ 
represent initial relationships and the straight arrows indixat&causal ^r.elat^onsh 
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School :_ 
Date: 



Appendix A 
Accuracy of Implementation Rating Scale 

Student: 



Teacher : 



Observer (Items 1 and 2):_ 
Rater (Items 3-13): 



Number of observations prior to rating 



Time observationr .begins : 



Time observation ends: 



Time- allocated to reading instruction per day: 

Curriculum used for measurement: Publisher 
Series ^ Level 



Instructions 

^Circle the number that accurately reflects your rating for each 
variable. Only one* number may be circled per variable. 1 reflects a~ 
Vow level of implementation and 5 means total implementation of the 
Procedures to Develop and Monitor Progress on IEP Goals., See Operation- 
al Definitions. Items 1 and 2 require direct observation of the measure- 
ment administration. Items 3, 4, 5, 6, and 7 require inspection of the 
student graph. Items 8, 9, and 10 'require inspection of the student 1 s> 
IEP form. The Instructional Plan must be inspected to rate item 11. 
The. Change Record must be inspected to rate items 12 and 13. 





i . 


Administering the Measurement Task 


1 ■ 2 


3 m 


• 4 


5 


•r- XI 

, o o 


2: 


Selecting vthe Stimulus Material 
Sampling for Instructional Level 


'l 2 


3 


4 


5 




3. 


1 2 


3 


4 


5 




4. 


Baseline fa 


1 2 


3 


4 


5 




5 . 


. Graph Set-up 


1 2 


3 


4 


5 




6. 


Aimline , 


1 2 


3 


4 


5 




7. 


Timing of Instructional w Changes 


1 2 


3 


•4 


5 




~8. 


long-Range Goal 


1 2 


3 


4 


5 


CD CL 
CX LU 
U*> i— « 


9. 


Short-Term^ Objective 


1 2 


2 


4 


5 


»— i 


ID. 


Measurement System 


1 2 


3 


4 


'5 


Inst. 

Dl an 

r i an 




Instructional Plan 


1 2 


3 


4 


5 


Q. CT\ • 


12. 


Substantial Changes 


1 4 


3 


4 


5 


^ c o 

C ra 0) 
^ XZ CC 


13. 


One.^eiear Change 


1 2 


J 


4 


5 
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Operational Definitions 
Accuracy of Implementation Rating Scale 
Administering the Measurement Task 

5 - The measurement task is administered correctly: teacher 
brings stopwatch and pencil to measurement area; gives 
correct directions for the task; administers the measure- 
v ment procedure for one minute; correctly marks the teacher 
copy; correctly counts words correct and incorrect; cor- 
rectly counts words correct and incorrect; correctly 
•plots the data point. 

1 - The teacher: 'forgets necessary materials; does not give • 
directions; does not time* the task accurately; fails to 
mark the teacher copy or incorrectly marks errors; miscounts 
correct an^cl incorrect words; and inaccurately plots the data 
point. 

,* 

Selecting the Stimulus, Material • 

5 - The teacher^has followed these procedures: Uses passages 
selected from the level that represents the annual goal. 
Observers should record the book from which the passage 
was selected and later check this with the long-range goal 
level. At this level find the pa'ges in these stories that 
do, not have excessive dialogue, indentations, and/or unusual 
pronouns. 'Write these page numbers on equal size slips of 
paper. 

- Put the slips of paper into a drawbag and shake it. 
* - Randomly pick a slip of paper. 9 

- The page number chosen is the page where the student 
begins reading. If the page chosen is a passage that 
was read earlier during the week, draw another page 
number. 

Other completely random procedures are also rated a 5. *lf, 
however, not all passages have an equal "chance of being 
selected, a 4 rating would be indicated. 

1 - The teacher" fails to randomly pick the passage or the sample 
taken from a domain which is greater or smaller than the one 
indicated in the goal. 

Sampling for Instructional Level * " 

5 The teacher has sampled from higher or lower reading' level s 
to find the level in which the student reads 20-29 wpm 
(grades 1 & 2), or 30-39 wpm (-grades 3 and up). 



1 - The teacher is measuring at a level which is too high or 
• too low. 

Basel ine 

5 - The student's performance has been measured at least 3 times to 
establish a stable baseline. A stable baseline means that all 
data points fall within a range of 10. 

1 - The teacher has not found a level for which a stable baseline 
has been established or has failed to collect 3 data points 
during the baseline phase. 

Graph Set-Up 

« 

5 - The graph is accurately set up: The dates filled in on the 
horizontal axis; the vertical axis is correctly labeled words 
"read perihinute from material; the units of measure- 

ment are specified; the student's name and subject area are 
certified; a key identifies the symbols for correct (.) and 
incorrect (x); symbols are placed at the intersection of date * 
and score; the data points are connected with straight lines; 
and absences are recorded on the graph as (abs.). 

1 - The graph does not include many of the items mentioned above. 

Aim! ine 

5 - The long-range goal is marked on the graph with an X at the 
intersection of the desired performance level and date of 
attainment and a line of desired progress connects the 
point representing the student's median score of the last 
3 data points from basejine and the LRG. 

1 - The long-range goal is not marked on the graph and/or the 
median and LRG are not connected. 

Timing of Instructional Changes ' 

"5 - All the adjustments in the student's program are made at the 
appropriate time given the ru^es for data utilization: 

(1) Compare the actual slope based on 7 to 10 data points 
to .the slope required to attain the Annual Goal. 

(2) If the actual slope is equal to, or st.eeper than, the 
Annual Goal slope, continue the program. 

(3) If the actual slope is flatter than the Annual Goal 
slope, change the program.- 

1 - None of the adjustments in the student's program are ipVde 
at the appropriate time. iT 
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5 - The long-range goal is accurately written; goal specifies 
the number of weeks until next review; stimulus materials 
for the goal represents the level in which the student 
is performing at entcy level criterion; goal specifies 
student behavior; goal specifies mastery criterion of 
50 wpm with fewer than 5 errors. (grades 1 & 2) or 70 wpm 
with fewer than 7 errors (grades 3r5) when there are 36 
weeks until the annual review. If there are fewer than 36 
weeks, the criteria can be lowered proportionately. 

1 - The long-range goal contains none of the above criteria. 

9. Short-Term Objective 

5 - The short-term objective is accurately written; stimulus 

material and behavior is^specified; and the average increase 
in performance is the desired performance minus actual 
performance divided by the number of weeks until the annual 
review. 

1 - The short-term objective contains none of the above* criteria . 

10. Measurement System 

5 - The teacher has indicated how the material is organized, the 
frequency of measurement, and what is to be recorded on the 
graph. 

1 - The measurement system "is not specified. 

1 1 . Instructional Plan 

5 - The instructional plan includes clear and specific descriptions 
of the instructional procedures, the time spent in each acti- 
vity, the pertinent materials, the arrangements, and the 
motivational strateg les . * 

fThe instructional plan is unclear and lacks specific descrip- 
tions of the instructional procedures, the time spent in each 
activity, the pertinent materials, the arrangements, and" the 
motivational strategies. 

12. Substantial Chaages * 

5 - The adjustments in the student 1 s program are always substantial 
(have^ a good chance of being ef/ective; see Unit XIV). 

1 - The adjustments are never substantial. 



Clean Change 

5 - All the adjustments made introduce only one, clear program 
change . 

1 - All the adjustments made introduce more than one change 
and/or the change is unclear* ( 
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* Appendix B 



Structure of Instruction Rating Scale (SIRS) 

School : Student: 

Date: Teacher : 



Observer: . Number of Students in Group: 

Number of observations prior to rating: 

Time observation begins: Time observation ends: 

Time allocated to reading instruction per day:_ 

Currirulum used for instruction: Publisher ; „ 

Series^ Level m * 

Instructions 

Circle the number that accurately reflects your rating for each 
variable. Only one number may be circled per'variable. If you are 

unable to evaluate a certain variable, mark N/A (not applicable) next 
to the left-hand column. 

1. Instructional Grouping 1 2 3 4 5 

♦ 

2. Teacher-directed Learning 12 3 4 5 

3. Active Academic Responding 12 3 4 5 

4. Demonstration/Prompting 12 3 4 5 

5. Controlled Practice 1 2 3 4 5 

6. Frequency of Correct Answers 1 2 3 4 5 

i 

7. Independent Practice ■ 12 3 4 5 

8. Corrections 12 3 4 5 
9; Positive Consequences 12 3 4 5 

10. Pacing 12 3 4 5 

11. Oral Practice on Outcome 

Behavior 12 3 4 5 

12. Silent Practice on Outcome. 

Behavior 1 2 3 4,5 
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SIRS 

Operational Definitions Codebook 



- 1. Instructional Grouping 

5 - 90% or more of the instruction this student receives from the 

teacher is on an individual basis. 

# 

\ - 10% or less of the instruction this student receives from the 
teacher is on an individual basis. 

2. Teacher-Directed Learning 

5 - Student's instruction is extremely organized, businesslike,' 
and teacher is firm in direction and control of activities. 
For example, student is presented with questions, student 
has material to cover, etc, 

1 - Student's instruction is casually organized and very spon- 
taneous. Teacher is not committed to having thg student work 
on a particular set of material, ■ Instructi onal materials do 
not determine what activities student engages in and the les- 
sons change according to problems or mood of this student. 

Active Academic Responding 

5 - The student is actively practicing the academic skills to be 
learned more than 75% of the time observed. Specifically, the 
student is engaged in oral or written responding to teacher 
questions or written material, e.g., reading aloud, answering 
questions, writing, or computing. Student rarely is involved 
in non-academic conversations with teacher or other students 
Attending to the lesson without responding, such as sitting, 
looking, listening, and/or -fol lowing along in a book does not 
apply. The student must make ar\ active, written or oral 
response. 

1 - The .student is ,active ^practicing the skills to be learned 
r?ss than -10% of the tHe bbserved. Instructional, lessorts ■ 
may be interrupted or shortened to include "process" aadJother 
non-academic activities, e.g., clarifying feelings, opinions, 
and working on arts and crafts. 

4. Demonstration and Prompting 

5 - Appropriate steps of the desired behavior to be performed are 
demons trated for the student. Student is given an opportunity 
to practiie the step(s) as teaaher provides prompts for correct 
behavior that approximates or achives desired response. 

1 - Teacher attempts to teach the student a behavior without using 
demonstration and prompting techniques. 



5. 



Controlled Practice 



5 - Students practice of material is actively controlled by 
teacher who frequently asks questions to clarify that the 
student understands what has ; pst been demonstrated. Ques- 
tions: are convergent ( single factual answer ) and the stu- 
dent's answers consistently follow the questions and are 
given teacher feedback.* ' * 

1 - Student is rarely questioned by teacher following demonstra- 
tion of new materials. Questions are more divergent (open- 
ended, several interpretations) than convergent (single factual 
answer). Student's response is not. consistently followed by 
teacher feedback. The type of questions are such that several 
answers are acceptable, i.e., questions are abstract or am- 
biguous. # 

Examples: 

» 

If during an' oral reading session: 

a) the teacher frequently attempts, to clarify the material with 
convergent questions ("what color hat was John wearing?"), a 
-S-tfould be recorded, v / 

b) the teacher asks few questions, most of which are divergent 
("What do you think this means?"), a 1 would be recorded. 

% 

c) the teacher asks fev/ convergent questions or many divergent 
questions, the appropriate rating woul d be a 3 . 

6. Frequency of Correct Answers 

5 - Academic lessons are conducted in such a way that the difficulty 
. of the material allows the student to achieve mean accuracy • 
of 80% or higher. 

1 - Academic material is difficult for student ,' component steps 
are large or unsequenced, and mean accuracy for student is 
less than 55%. * 

(Note: If the student has no opportunity^ for oral or written response 
during the observational period, item 6 would be rated N/A - 
not applicable, whil§ item? 3 and 5 would most likely be 
rated 1 ). 

7. Independent Practice 

5 - When engaged in independent seatwork, the student frequently is 
monitored tfy the teacher who assists, clari fies < and praises 
the student for academic engaged tasks. 

(Note: Independent seatwork is defined here as a student working on an 
assigptd task for at least 5 minutes, ilf no such 5-minute , 
block of time is observed, Item 7 is rated N/A].) 



1 - When student is engaged in academic seat-work activities, little 
attention is given by teacher who directs seat-work activities 
from a distance or engages in work separate from the assigned 
seat- work. Teacher is generally not helpful or supportive to - 
student during independent practice time. 

* 

Corrections 

5.- The student's errors are consistently corrected try the teacher. 
When the student either does not respond, responds incorrectly, 
or does not respond in unison if the activity is group directed 
and requires such responding, the teacher will systemati cal 1y 
attempt to correct the student by asking a simpler question", re- 
, focusing student 1 s. attention to elicit correct response from the 

student or provide general rules by which to determine the 
• correct answer 90% or mare of the time. 

1 Student's errors are rarely and i nconsi-stently corrected by the 
teacher.. Jhe student responses are not systematically corrected 
■ Student's errors are corrected 50% or less of the time. 

example: In oral reading this includes teacher correction of skips 
and mispronunciations, or help in soundijig out hesitations 

Positive Consequences , ^ 

5 - Positive events (tokens, points, acybities, etc.) are given to 
the student when performing the desu^d behavior. When learning' 
a new skill the student receives posi tive- consequence for 
approximations of the desired behavior. Consequences are con- 
sistently received during academi c t traini ng time. Praise and 
compliments, e.g., "good working, nice job," are not included 
in this definition. _ > 

1 - Student rarely receives positive consequences for academic work. 
Wherf student receives consequences they usually are fpr social 
behavior, rather than for behaviors occurring under systematic 
academi c traini ng . , 

Pacing * * ' 

5 - The pace of the lesson is rapid, providing many opportunities 
«fof* response by thfe student. A§ a result, attention is high 
and off-task" behavior is low. 

1 - The pace of the lesson is slow and the student's rate of 
responding ,is low.- Lesson format frequently varies, is not 
highly structured, and student attention may be low. 
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* * 

1 1 . Oral Practice on Outcome Behavior 

•» 

5 Student, r-jegds aloud from context nearly all the time (85-1 00% 
or 1*2-1*5 tiiin. of a 15 mi n . observation). 

1 - Student does not read aloud during the observation (0% of the . 
time). 

(Note: Reading aldiid for measurement purposes should not be considered 
when rating this variable. • Reading in context is defined as 
reading phrases, sentences, paragraphs, or story selections.) 



Examples: 

£ If the student is reading isolated words nearly the entire time, 
the appropriate rating is a 3. 

If the student is reading aloud from a text about half the time, 
a 3 would be recorded. 



12, Silent Practice on Outcome Behavior 



5 - Student reads silently from context nearly all the time (85-100% 
or 12-15 min. of a 15 min. observation). 

1 - Student does not read si'lerrtly during the observation (0% of 
the time). j" 

(Note: Reading in context is defined as the same, as #11. The examples 
of #11 are the.same for #12,, with silent reading.) 
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